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(54) A method and device for automatically controlling a region in space 



(57) The monitored region (S) is monitored by image 
signal generating means such as video cameras (2) in 
order to obtain a succession of images of the bodies (A, 
B) present in the monitored region, each image corre- 
sponding to a defined instant. The images are proc- 
essed in such a way as to obtain, for each instant con- 
sidered, a volumetric map of each body present in the 
region (S). This map, which identifies characteristics of 
shape, position, volume and dimensions of the body to 
which it refers, is processed in order to extract from it at 
least one parameter selected from the following group: 
descriptors of shape and volume, such as the volumetric 
map itself, the co-ordinates of position and the dimen- 



sions of each body to which the volumetric map refers. 
The parameter or a succession of values of the param- 
eter obtained in this way is then compared with at least 
one model of these characteristics stored in a process- 
ing unit (1 ). Depending on the outcome of this compar- 
ison operation, a procedure of surveillance and/or re- 
porting may be selectively activated. The solution is ap- 
plicable, for example, to the automatic monitoring of mu- 
seum environments, e.g. to ensure that visitors do not 
come too close to an exhibited work, or to the monitoring 
of industrial environments, e.g. to ensure that an oper- 
ator does not come too close to a dangerous machine 
or process. 
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able by the method in order to define different degrees 
of danger and/or alarm as a consequence of the pres- 
ence of bodies within defined subregions of space. 
[0014] The invention will now be described, purely by 
way of non-restrictive example, with reference to the at- 
tached drawings, in which: 

Figure 1 shows diagrammatically the characteris- 
tics of the system according to the invention used 
for the monitoring and automatic surveillance of a 
defined region of space, 

Figure 2, comprising four parts respectively labelled 
a1 -a2 and b1 -b2, illustrates the generation of image 
signals within the context of the solution illustrated 
in Figure 1, 

Figures 3 and 4 illustrate, in ways basically identical 
to those of Figures 1 and 2, another possible em- 
bodiment of the solution according to the invention; 
in particular, Figure 4 is composed of eight parts re- 
spectively labelled a1-a2, b1-b2, c1-c2 andd1-d2, 
Figure 5 illustrates the methods adopted for calcu- 
lating a so-called map of volumetric occupation, 
Figure 6 illustrates schematically one of these maps 
capable of being obtained within the context of the 
invention, and 

Figure 7 is a flow diagram relating to the generation 
and use of such a map. 

[0015] In particular the expression "volumetric map" 
as used here means any representation of occupied vol- 
umes due to the presence of a body, in other words a 
representation of a three-dimensional map in which the 
regions of volumetric occupation introduced by the pres- 
ence of bodies are indicated. Such a map is obtained 
after image analysis procedures have been carried out 
using automatic methods known per se or according to 
the embodiments of the invention described below. For 
a summary of some of these methods the following may 
usefully be referred to: Man* D., "Vision 0 , Freeman, 
1982; Ballard D.H. and Brown CM. "Computer Vision", 
Prentice Hall, 1982; Martin W.N. and Aggarwaal J.K., 
"Volumetric description of objects from multiple views", 
IEEE Transactions on Pattern Analysis and Machine In- 
telligence, vol. 5, pp. 150-1 58, 1983. From this map, by 
means of the volumetric analysis carried out using au- 
tomatic methods known per se it is possible to derive 
the characteristics of shape, volume, dimensions and 
position of the bodies present in a defined region of 
space in such a way that they can easily be compared 
with similar representations obtained from the volumet- 
ric maps of other bodies. 

[0016] In both Figure 1 and Figure 3, the reference S 
indicates a region of space in which it is wished to detect 
the presence of people A or objects B. 
[0017] The region S may be bounded by physical 
walls , as for example in the case of a room or cage, or 
may consist simply of a portion of space bounded by an 
imaginary closed surface that separates a generic 



space in two regions, or it may be bounded partly by 
physical barriers, for example the floor, and partly by an 
imaginary surface. The monitored region has however 
the feature of a volume, may be of any shape and can 
5 be defined simply and flexibly according to need. 

[0018] In the currently preferred embodiment of the 
invention, the volumes occupied by the bodies (such as 
bodies A, B visible in Figures 1 and 3) that are present 
in the monitored region S are found by using two or more 
10 video cameras (acting as image signal generating 
means) installed in such a way that the region S is in the 
visual field of at least two video cameras 2, as shown 
for example in Figures 1 and 3 (the latter figure referring 
to a solution in which four video cameras 2 are used). It 
is is advisable for the video cameras 2 to be so positioned 
as to avoid occlusions due to the movement of objects 
on the same plane; for example, for bodies of different 
heights standing on the floor and moving about, it is pref- 
erable to have views from above. 
20 [001 9] The signals (of analogue type or already direct- 
ly converted into digital form) output by the video cam- 
eras 2 are sent to a processing unit 1 which may be a 
specialized processor or, in the currently preferred em- 
bodiment of the invention, a computer such as a pro- 
25 grammed personal computer (known per se) in order to 
extract from the images the shapes of the bodies A and 
B present within the region S to be monitored. The object 
here is to check for the possible presence of bodies not 
inherently belonging to the monitored region. 
30 [0020] In particular, Figures al and bl included in Fig- 
ure 2, and Figures al, bl, cl and dl included in Figure 4 
show the images produced by the two video cameras 
depicted in Figure 1, on the one hand, and by the four 
video cameras depicted in Figure 3, on the other. 
35 [0021] Using the signals corresponding to the above- 
mentioned images, the unit 1 is able, in accordance with 
known principles (typically by using known image 
processing algorithms), to extract respective sets of da- 
ta representing the abovementioned shapes of the bod- 
40 jes present within the region S. For example, Figures a2 
and b2 of Figure 2 : and Figures a2, b2, c2 and d2 of 
Figure 3 show the shapes, marked C of the body A 
present within the region S corresponding to images a1 
and b1, or a1, b1, d and d1, produced, from their re- 
45 spective points of observation, by the video cameras 2. 
[0022] For an overview of these algorithms, the fol- 
lowing may usefully be referred to: Huang TS., "Image 
sequence processing and dynamic scene analysis", 
Springer- Verlag, 1982; Jain A., "Fundamentals of digital 
50 image processing", Prentice Hall, 1989; Jain J.R. and 
Martin W.N. and Aggarwaal J.K., "Segmentation 
through the detection of changes due to motion", Com- 
puter Graphics and Image Processing, vol. 11, pp. 
13-34, 1979; Debuisson M.-P, "Contour extraction of 
55 moving objects in complex outdoor scenes", Int. Journal 
of Computer Vision, vol. 14, pp.83-105, 1995; Bichsel 
M., "Segmenting Simply Connected Moving Objects in 
a Static Scene", IEEE Transactions on Pattern Analysis 
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which the monitored space has been divided are occu- 
pied by objects. The aim of all this is to obtain, as a re- 
sult, the representation seen in Figure 6. Using this tech- 
nique, the volumetric map approximates to the volume 
of actual occupation with a resolution based on the di- s 
mensions of the cells D1 , D2 and on the spatial resolu- 
tion of the means employed to generate the image sig- 
nals (in particular, in the case of video cameras com- 
posed of a matrix of sensitive points, the resolution in a 
particular direction is given by the ratio of the dimension io 
of the observed region to the number of sensitive ele- 
ments in that direction). The dimensions of the cells are 
defined having regard not only to the resolution but also 
to the processing capacity of the unit 1 and of the fre- 
quency with which the surveillance information is to be 1B 
updated. Also borne in mind is the fact that the overall 
degree of approximation can be enhanced, as already 
stated, by using more video cameras in suitable posi- 
tions. 

[0035] In particular, the processing unit 1 can be pro- 20 
grammed, again in a known manner, to carry out a vol- 
umetric analysis of bodies for the purpose of recognizing 
the distinctive characteristics of shape and volume of 
the objects analysed. The programming can be done by 
conventional algorithmic approaches, thus coding ab 25 
initio the characteristics of shape and volume of the bod- 
ies to be detected into the processing system, or using 
statistical learning techniques such as for example neu- 
ral networks. It is also possible to design the unit 1 such 
that it is able to evaluate the way the positions of the 30 
bodies examined within the region S are changing in 
space and time and deduce the dynamics of their move- 
ments, in particular the line, and the direction a long this 
line, of their displacements. 

[0036] This point will become clearer on referring to 35 
the flow diagram given in Figure 7, which shows, in a 
deliberately schematic way, for ease of comprehension, 
the principles by which the functions of automatic mon- 
itoring are carried out in the unit 1 . 

[0037] Assuming the process to start at a starting step 40 
100, in a step 101 the unit 1 examines the data set cor- 
responding to the images generated by the video cam- 
eras 2 (optionally already processed to refer only to 
moving objects) and in a second step 102 commences 
an action of scanning the region S in such a way as to 45 
scan the cells D into which the region S has been theo- 
retically divided up. As a general rule, each of these cells 
will be identified by three co-ordinates Xj, y h z t within the 
system x, y and z to which the bottom part of Figure 5 
refers. so 
[0038] From now on it will be assumed, for simplicity, 
that this scanning operation applies, on each succes- 
sive detection of the images of the region S, to all the 
cells contained within the region S scanned on a "matrix" 
principle, for example in successive lines (co-ordinate ss 
x), columns (co-ordinate y) and planes (co-ordinate z). 
[0039] Those skilled in the art of image processing will 
have realized that it is possible (e.g. in order to reduce 



the processing cost and/or speed up the processing) to 
adopt different scanning systems, such as predictive- 
type scanning systems which, once initialized with ref- 
erence to a map of initial volumetric occupation, perform 
subsequent scans only on cells where there exists some 
degree of likelihood inherent in the fact that these cells 
may be significant in the generation of subsequent 
maps, the aim being to avoid the need to perform ex- 
haustive scanning of the entire region S for each updat- 
ing operation. 

[0040] In this context it is also known that it is possible 
to intervene in such a way that, when operating on the 
abovementioned principles, the unit 1 is also capable of 
detecting, for example, the entry into the region S of a 
body not previously present, the aim being to extend the 
scanning action to those cells (previously not included 
in the scanning action) which the body subsequently oc- 
cupies. 

[0041] The steps marked 1031, 1041; 1032, 1042; ...; 
103n, 104n indicate successive processing stages, here 
shown as carried out in parallel, though in fact they can 
be performed serially, and therefore sequentially in time. 
In the course of these steps, for each video camera 1 , .... 
n (n is equal to 2, and to 4, in the illustrative embodi- 
ments shown in Figures 1 and 3, respectively) and for 
each cell D(Xj, y b z t ) that is scanned, the unit 1 checks 
to see whether the cells corresponding to their respec- 
tive images generated by the video cameras 2 can be 
regarded as occupied or unoccupied. 
[0042] In the next step, indicated by the general ref- 
erence 105, the results of the comparisons carried out 
in steps 1041, 1042, .... 104n are processed in order to 
decide whether, on the basis of the image data, the 
scanned cell is to be regarded as occupied or unoccu- 
pied for the purposes of constructing the map of volu- 
metric occupation. 

[0043] The relevant criteria for attributing the "occu- 
pied" or "unoccupied" logic value may differ. 
[0044] On this subject it should be remembered that 
the cells of the region S are not necessarily all covered 
by all of the video cameras 2. As a consequence, in the 
case of certain cells, attribution of the "occupied" value 
may be based on a different number of decision proc- 
esses relating to the individual images than the number 
of images taken into consideration in attributing the "oc- 
cupied" logic value to other cells. 
[0045] The criterion used in attributing the logic value 
in question may be of unanimous type (the cell is judged 
to be occupied for the purposes of the construction of 
the map of volumetric occupation if and only if all the 
video cameras 2 whose images are taken into account 
produce data corresponding to occupation in the rele- 
vant image), majority type (the cell is judged to be oc- 
cupied if the majority of video cameras 2 give data indi- 
cating occupation . in the respective images), or correla- 
tion with the values attributed to adjacent cells (so that 
uncertainty in the attribution of the "occupied" value to 
a cell is resolved on the basis of confident values attrib- 
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be people only. 

[0057] Furthermore, it is possible to detect the simul- 
taneous presence of several bodies, even if of different 
kinds, in the monitored region. The manner in which the 
position of the bodies change within the monitored re- 
gion can be used to detect violation of predefined sub- 
regions. It is thus possible to monitor, as has already 
been seen, the presence of a movement of people in 
the vicinity of a machine in an industrial environment and 
activate an alarm signalling procedure whenever at 
least one person comes within a certain distance of that 
machine. 

[0058] The solution described is highly robust and 
overcomes the functional limitations of currently used 
systems. Thus, it is capable of detecting the presence 
and at the same time determining the position of people 
or objects within a defined region of space, discriminate 
between objects and people, between objects or people 
close to each other, and between objects and people 
that move into the monitored region following different 
paths or more generally with behaviours which could 
easily deceive other types of sensor. 
[0059] Those skilled in the art will recognize that the 
method according to the invention can be carried out 
using, at least in part, a computer program capable of 
being run on a computer in such a way that the system 
comprising the program and the computer carries out 
the method according to the invention. The invention 
therefore extends also to such a program capable of be- 
ing loaded into a computer which has the means of or 
is capable of carrying out the method according to the 
invention, as well as to the corresponding information 
technology product comprising a means readable by a 
computer containing codes for a computer program 
which, when the program is loaded into the computer, 
cause the computer to carry out the method according 
to the invention. 

[0060] Clearly, without affecting the principle of the in- 
vention, the constructional details and the embodiments 
may be greatly altered compared to what has been de- 
scribed and illustrated, without thereby departing from 
the scope of the present invention, as defined in the ac- 
companying claims. 



Claims 

1. Method for the detection and location of bodies (A, 
B) in a defined region of space (S), comprising the 
operations of generating (2) image signals capable 
of representing a succession of images of at least 
one body present in the said region (S), each image 
corresponding to a defined instant, characterized in 
that it comprises the following operations: 

processing (1 01 to 1 06) the said image signals 
in such a way as to obtain for each instant taken 
into consideration a volumetric map (F) of the 



said at least one body present in the said region 
(S), the said volumetric map (F) representing 
the shape, position, volume and dimensions of 
the body to which the said volumetric map (F) 

5 refers, 

extracting (107) from the said volumetric map 
(F) at least one parameter (P) taken from the 
following group: descriptors of shape and vol- 
ume, such as the volumetric map (F) itself, the 

10 position co-ordinates and the dimensions of the 

said at least one body to which the said volu- 
metric map (F) refers, 
- comparing (1 08) the said at least one parame- 
ter (P) with at least one model (D) for compat- 

15 ibility of the said volumetric map (F) with pre- 

determined conditions of occupation of the said 
region (S), and 

selectively generating a warning signal (109) 
depending on the outcome of the said compar- 
20 ison(108). 

2. Method according to Claim 1 , characterized in that 
it comprises the operations of storing volumetric 
maps (F) or successions of the said at least one pa- 

25 rameter (P) obtained from image signals relating to 
images of the said succession corresponding to 
successive instants, in order to detect changes in 
time in the said volumetric map (F) or the said at 
least one parameter (P), and in that the said model 

30 js itself generated as a model of changes over time. 

3. Method according to Claim 1 or Claim 2, character- 
ized in that it comprises the operation of comparing 
(108) successions of the at least one parameter (P) 

35 obtained from image signals relating to images of 
the said succession corresponding to successive 
instants with at least one model for compatibility of 
said successions with predetermined conditions of 
occupation of the said region (S). 

40 

4. Method according to Claim 1 or Claim 2, character- 
ized in that it comprises the following operations: 

generating image signals relating to the view of 
45 the said region (S) Irom separate observation 

points (2) so as to generate at least two sepa- 
rate image, signals relating respectively to the 
projections of the same points of the said region 
(S) viewed from separate observation points, 
50 - processing (1 ) the said separate image signals 
by finding the match between the projections of 
the same real-world points onto separate imag- 
es with a view to finding its position in space, 
obtaining the said volumetric map (F) from the 
55 said positions in space. 

5. Method according to Claim 1 or Claim 2, character- 
ized in that it comprises the following operations: 
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Claims 1 to 5. 

13. Information technology product comprising a 
means readable by a computer containing codes for 
a computer program which, when the program is s 
loaded into the computer, cause the computer to 
carry out the method according to any one of Claims 
1 to 5. 

10 
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